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SUMMARY 

The nucleotide sequence of a genomic cDNA clone corresponding to the 5' terminal 
domain of mRNA D of the Beaudette strain of infectious bronchitis virus (IBV) has 
been determined. This region contains three open reading frames which predict 
polypeptides of molecular weights 6700 (6-7K), 7*4K and 124K. The predicted 124K 
polypeptide has a codon usage very similar to that predicted for the products of the IBV 
nucleocapsid, membrane and spike genes. The sequence also predicts a hydrophobic, 
potentially membrane-anchoring, region in the N terminal half of the 124K 
polypeptide, and a hydrophilic C terminus. 


Coronaviruses are enveloped viruses with a single-stranded RNA genome of positive polarity 
(Siddell et al. , 1983; Sturman & Holmes, 1983). The genome of infectious bronchitis virus (IBV) 
is about 20 kilobases in length (Stern & Kennedy, 1980 a; Siddell et al ., 1983). In IBV-infected 
cells six major mRNA species are produced. These mRNAs, designated A to F, range in length 
from about 2 kb to genome length, and have been shown to share a common 3' terminus and 
form an overlapping or ‘nested’ set (Stern & Kennedy, 1980 a, b) (see Fig. 1). Translation studies 
in vitro have demonstrated that mRNAs A, C and E encode the three major viral proteins: the 
nucleocapsid protein, the membrane glycoprotein and the precursor to the spike or surface 
projection glycoprotein, respectively (Stern & Sefton, 1984). Sequencing of the IBV genome has 
shown that the coding sequences for these polypeptides lie largely within the ‘unique’ 5' terminal 
region of each mRNA species which is not present in the next smallest mRNA (Boursnell et al ., 
1984, 1985; Binns et al ., 1985). However, no specific translation products have, to date, been 
detected from mRNAs B and D (Stern & Sefton, 1984; Boursnell & Brown, 1984). Sequencing 
studies of genomic RNA in the regions of the 5' terminal domains of mRNAs B and D have been 
carried out to determine whether these contain potential coding sequences. The 5' terminal 
sequence of mRNA B contains two open reading frames (ORFs) which potentially code for 
polypeptides of 7'5K and 9-5K (Boursnell & Brown, 1984). In this paper, we present the 
sequence, obtained from genomic cDNA clones, of the ‘unique’ 5' terminal region of mRNA D. 

The isolation of the cDNA clone, pMB179, which contains these sequences, has already been 
described (Binns et al., 1985). Briefly, a 13 base oligonucleotide primer, complementary to 
sequences at the 5" end of mRNA C (Boursnell et al., 1984), was used to prime cDNA synthesis 
from purified IBV Beaudette (Beaudette & Hudson, 1937) viral genomic RNA. One of the 
clones obtained, pMB179, contained a 5-3 kb insert which DNA sequence analysis subsequently 
showed had a 3" end 12 bases from the 5' end of the primer sequence. Prior to dideoxy sequencing 
(Sanger et al., 1977; Bankier & Barrell, 1983), Pstl and Rsal digests of pMB179 were subcloned 
into P^fl-digested M13mpll and SmAl-digested, phosphatase-treated M13mpl0, respectively. 
DNA sequence data were also obtained in the region of mRNA D by sequencing of DNase I- 
treated (Anderson, 1981) or sonicated (Deininger, 1983) fragments of pMB179 which had been 
subcloned into M13mpl0 as described by Binns et al. (1985). Fig. 1 shows the position of clone 
pMB179 and marks the region of sequence presented in this paper. 
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Fig. 1. Genomic organization of infectious bronchitis virus. The 3' co-terminal ‘nested’ set of mRNAs 
is shown. At the top are shown the positions of the genes coding for the major structural components of 
the virion, the spike (S), membrane (M) and nucleocapsid (N) polypeptides. Also shown are the 
positions of the ‘homology regions’ which are sequences present in the genome at positions 
corresponding to the 5' termini of the bodies of the mRNAs. The position of clone pMBt79 is shown, 
with the region of sequence presented in Fig. 2 represented by a box. 


Seven hundred and fifty-five bases of sequence are presented here. These are shown in Fig. 2 
with a translation in single-letter amino acid code of the main ORFs. They extend from a 
sequence CTGAACAA at position 1, which differs by only one base from sequences which 
appear at the 5' ends of the bodies of mRNAs A, B, C (CTTAACAA) and is identical to that 
found in mRNA E (Brown & Boursnell, 1984; Boursnell et a!., 1984, 1985; Boursnell & Brown, 
1984; Binns et aL y 1985), to an arbitrary position within the sequence of mRNA C. At position 
596 is the sequence CTTAACAA, which probably marks the 5' end of the body of mRNA C. 
These two sequences lie 3783 and 3188 bases from the poly(A) tract at the 3' end of the viral 
genome. These sizes would represent the lengths of the bodies of mRNAs D and C without 
either leader sequence (Brown et al ., 1984) or poly(A) tract, and therefore agree well with the 
estimated size of these mRNAs of 4-1 and 3*4 kilobases (Boursnell & Brown, 1984). Thus, bases 1 
to 596 of this sequence appear to represent the ‘unique’ 5' terminal domain of mRNA D which is 
not present in mRNA C. Bases 1 to 29 code for the COOH terminus of the spike gene and bases 
681 to 755 code for the NH 2 terminus of the membrane protein gene (Binns et al ., 1985; 
Boursnell et ai , 1984). 

There are three ORFs which lie in the 5' region of mRNA D. The first two non-overlapping 
ORFs, from bases 32 to 202 and 205 to 396, potentially code for polypeptides of 6*7K and 7-4K. 
A third ORF, from bases 383 to 706, potentially coding for a polypeptide of 12-4K, overlaps the 
second ORF by six amino acids and overlaps the coding sequences for the membrane 
glycoprotein by nine amino acids. Examination of the potential polypeptides encoded by these 
ORFs shows the 6*7K polypeptide to be neutral and hydrophobic whereas the 7-4K polypeptide 
is acidic with an overall negative charge of 13. The 12-4K polypeptide would have a 
hydrophobic N terminal domain and a hydrophilic C terminal domain. The sequences around 
the initiation codons of the two small ORFs, UNNAUGA and CNNAUGU, are used 
extremely rarely in functional eukaryotic initiation codons, but are the most common sequences 
found around ‘non-functional’ upstream AUGs (22% and 44% of mRNAs surveyed by Kozak, 
1983). The sequence flanking the initiation codon of the 12-4K ORF, GNNAUGA, is also fairly 
rare as a functional initiation codon (2% of mRNAs surveyed) but is not classified as a ‘non¬ 
functional’ upstream AUG (Kozak, 1983). Examination of the codon usage of these three 
potential polypeptides (Staden, 1984) shows that the 12 4K ORF has a codon usage very similar 
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MIQSPTSFLIVLILLWCKLV 

EQYRPKKSV** 

CTGAACAAT ACAGACCTAAAAAGTCTGTTTGATGATCCAAAGTCCCACGTCCTTCCTAATAGTATTAATTCTTCTTTGGTGTAAACTTGT 
T AT T 

10 20 30 40 50 60 70 80 90 


LSCFREFIIALQQLIQVLLQIINSNLQSRL 

ACTAAGTTGTTTTAGAGAGTTTATTATAGCGCTCCAACAACTAATACAAGnTTACTCCAAArrATCAATAGTAACTTACAGTCTAGACT 

C 

100 110 120 130 140 150 160 170 180 


MLNLEVIIETGEQVIQKISFNL 

TLWHSLD* 

GACCCnTGGCACAGTCTAGACTAATGTTAAACTTAGAAGTAATTATTGAAACTGGTGAGCAAGTGATTCAAAAAATCAGTTTCAA'nTA 
T C 

190 200 210 220 230 240 250 260 270 


QHISSVLNTEVFDPFDYCYYRGGNFWEIES 

CAGCATATTTCAAGTGTATTAAACACAGAAGTATTTGATCCCnTGACTATrGTTATTACAGAGGAGGTAATnTTGGGAAATAGAGTCA 

C 

280 290 300 310 320 330 340 350 360 


AEDCSGDDEFIE* 

MMNLLNKSLEENGSFLTALYIIVG 

GCTGAAGATTGrrCAGGTGATGATGAATTTATTGAATAAGTCGCTAGAGGAGAATGGAAGnTTCTAACAGCGCnTACATAATTGTAGG 

A T T 

370 380 390 400 410 420 430 440 450 


FLALYLLGRALQAFVQAADACCLFWYTWVV 

AriTrrAGCACTTTATCTTCTAGGTAGAGCACTTCAAGCATTTGTACAGGCTGCTGATGCTTG'ITGTTTA'nTTGGTATACATGGGTAGT 

460 470 480 490 500 510 520 530 540 


IPGAKGTAFVYKYTYGRKLNNPELEAVIVN 

AATTCCAGGAGCTAAGGGTACAGCCTTTGTATACAAGTATACATATGGTAGAAAACTTAACAATCCGGAATTAGAAGCAGTTAITGTTAA 


550 560 570 580 590 600 610 620 630 


EFPKNGWNNKNPANFQDAQRDKLYS* 

MPNETNCTLDFEQS 

CGAGnTCCTAAGAACGGTrGGAATAATAAAAATCCAGCAAATTTTCAAGATGCCCAACGAGACAAATTGTACTCTTGACTn'GAACAGT 
640 650 660 670 680 690 700 710 720 


VQLFKEYNLFI 

CAGTTCAGC'lTiTT’AAAGAGTATAATTTATTTATA 

730 740 750 

Fig. 2. 755 bases of DNA sequence from the IBV Beaudette genomic cDNA clone pMB179, 
representing the 5' terminal domain of IBV mRN A D. A translation in single-letter amino acid code is 
shown above the three main open reading frames (ORFs). The ‘homology regions 1 (see Fig. 1) are 
underlined. Where the M41 sequence obtained overlaps the Beaudette sequence (bases 1 to 560) the 
differences are shown beneath the Beaudette sequence. In all cases the sequence has been completely 
determined on both strands. 


to that predicted for the other IBV polypeptides whose genes have been sequenced, but that the 
two smaller ORFs have not. These results suggest that the two small ORFs may not code for 
polypeptides in vivo but may only be chance ORFs. 

To investigate whether the upstream ORFs are conserved between different IBV strains we 
have sequenced a cDNA clone from another strain, M41 (Geilhausen et al ., 1972), which covers 
the region of sequence where these small ORFs occur. The M41 clone, 169, was made as 
described by Boursnell et al. (1984) and overlaps the sequences presented here from positions 1 
to 560. There are 12 base changes between the two strains. The bases altered in M41 in this 
region are shown beneath the Beaudette sequence in Fig. 2. The sizes and positions of the two 
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(a) Amino acid sequence of 12-4K polypeptide from IBV mRNA D 


17 27 37 47 57 67 77 

SFLTALYI IVGFlALYLLGRMjQAWQAADAOCLF^^ GRKLJ^NPELEA 

+-—+ — - -+- — - + - - — — - +++ + - + ++ 

IFIVAVCLWVTIIVVAFLA SIKRCI©73GLC^^12XSPSIYLY1^SKQL YKY YNEEVRPPPLEV 
20 30 40 50 60 70 80 

Amino acid sequence of 10-2K polypeptide from MHV-JHM mRNA 5 

IBV 9-5K (13-36) 

IBV 12-4K (13-36) 

MHV 15-2K (15-37) 

MHV 10-2K (16—39) 



Fig. 3. ( a ) Amino acid homology between IBV 12-4K predicted polypeptide and MHV-JHM 10-2K 
predicted polypeptide. Plus signs show identical amino acids and minus signs show amino acids with 
similar (Kanehisa, 1982) properties. ( b ) Comparison of the predicted amino acid sequences of the IBV 
12-4K, MHV-JHM 15-2K and MHV-JHM 10-2K putative polypeptides with the IBV 9>5K putative 
polypeptide. Amino acids boxed-in show residues identical or similar (Kanehisa, 1982) to those of the 
IBV 9-5K sequence. The distances of the amino acids from the predicted N termini of the polypeptides 
are shown in parentheses. 


small ORFs are conserved in the M41 sequence, but the differences between the two strains at 
this point are not great enough to imply whether this is significant. However, the 'homology 
region' CTGAACAA, at position 1, is altered in M41 to CTTAACAA which is the form found 
in Beaudette at the 5' ends of the bodies of mRNAs A, B and C. Interestingly, this single base 
change results in the introduction of a termination codon (UAA) in the coding sequences for the 
M41 spike protein, which predicts that the M41 spike precursor would lack nine amino acids at 
the C terminus which are present in the Beaudette polypeptide. 

Two of the mRNAs from the mouse coronavirus MHV-JHM, mRNAs 4 and 5, also contain 
small ORFs which do not appear to code for any of the major structural components of the virion 
(Skinner & Siddell, 1985; Skinner et al 1985). The amino acid sequences predicted from the 
three ORFs in mRNA D and the two ORFs (7-5K and 9-5K) in mRNA B have therefore been 
compared with the sequences predicted from the three ORFs in mRNAs 4 and 5 from MHV- 
JHM using various computer programs (Staden, 1982; Kanehisa, 1982; Goad & Kanehisa, 
1982). A homology was found between the 12-4K ORF in IBV mRNA D and the 10-2K ORF 
from MHV-JHM mRNA 5 (Fig. 3 a). The match is statistically significant, the score being 
greater than four standard deviations away from that produced by comparing 100 random 
sequences of the same composition. The hydrophilicity plots (Kyte & Doolittle, 1982) of these 
two polypeptides are also similar, suggesting that they may be related or have a similar function. 
In addition there is some similarity between the N terminal regions of four of these putative 
small polypeptides. Fig. 3(b) shows these results. 

The fact that the codon usage of the 12-4K putative polypeptide is very similar to that 
predicted for the nucleocapsid, membrane and spike polypeptides strongly suggests that the 
largest ORF in mRNA D does code for a product in vivo . It is not clear at the moment what, if 
any, is the function of the two smaller ‘upstream’ ORFs, but it is interesting to note that both 
mRNA B of IBV (Boursnell & Brown, 1984) and mRNA 5 of MHV-JHM (Skinner et al., 1985) 
have 5' terminal regions containing two overlapping ORFs, and thus may code for more than 
one polypeptide. At the moment it is not possible to say whether the 12-4K product of mRNA D 
might be a structural component of the virion, but if it were it must only be present at very low 
levels, since no polypeptide of this size has been detected in [ 3 H]leucine-labelled preparations of 
virus (Boursnell & Brown, 1984). 
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The hydrophobic N terminus of the 12-4K polypeptide has a stretch of 21 uncharged amino 
acids, enriched in hydrophobic residues, which could span the viral membrane, possibly acting 
as a membrane-anchoring region. Two of the small polypeptides (10-2K and 15-2K) of 
coronavirus MHV-JHM (Skinner et ai, 1985; Skinner & Siddell, 1985) have similar 
hydrophobic domains and it has been suggested that they may play a role in siting membrane- 
bound transcription or replication complexes (Skinner & Siddell, 1985). The 12-4K polypeptide 
of IBV may have a similar function but in view of the fact that these polypeptides are probably 
not translated until the subgenomic mRNAs have already been transcribed, an involvement 
with replication complexes, producing full-length viral RN A, seems the more likely of these two 
suggestions. Another possibility is that they could be involved in a switch from transcription to 
replication activities, which is suggested by the observation that, in MHV, late in infection the 
genomic RNA is synthesized at a faster rate than the subgenomic RNAs (Brayton et al ., 1984). 


We are grateful to Penny Gatter, Bridgette Britton, Anne Foulds and Ian Fouids for excellent technical 
assistance. This research was carried out under Research Contract No. GBI-2-011-UK of the Biomolecular 
Engineering Programme of the Commission of the European Communities. 
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